There is the data: atp_tennis containing the following columns: ['Tournament', 'Date', 'Series', 'Court', 'Surface', 'Round', 'Best of', 'Player_1', 'Player_2', 'Winner', 'Rank_1', 'Rank_2', 'Pts_1', 'Pts_2', 'Odd_1', 'Odd_2', 'score'].  
--- The description for each column this data is:
Tournament: Name of the tennis tournament (Brisbane International, Chennai Open, Qatar Exxon Mobil Open ...etc)
Date: Date the match was played (year-month-day)
Series: Category or level of the tennis tournament (ATP250, ATP500, Masters1000 and Grand Slams offer 250, 500, 1000, and 2000 ranking points to the winner seperately.)
Court: Place the match was held (Indoors or Outdoors)
Surface: Type of court surface (Hard, Grass and Clay)
Round: Stage of the tournament (1st Round, 2nd Round, Quarterfinals, Semifinal and The Final)
Best of: Tourament systems ("best of 3" or "best of 5")
Player_1: Name of the first competitor in each match
Player_2: Name of the second competitor in each match
Winner: Name of the Player_1 or Player_2 who won the match 
Rank_1: World rankings of the Player_1 at the time of the match
Rank_2: World rankings of the Player_2 at the time of the match
Pts_1: Ranking points held by Player_1 before the match (accumulated over the season and higher points typically indicate a more successful player)
Pts_2: Ranking points held by Player_2 before the match
Odd_1: Betting odds for Player_1 (reflecting their perceived chances of winning)
Odd_2: Betting odds for Player_2 
Score: Final score of the match ("score of Player_1" - "score of Player_2")
---
Considering contents from the dataset and requirements from user. Please note DO NOT CHANGE FILE AND VARIABLE NAMES THAT I HAVE SET!
Interactions begin:
[USER (data scientist)]: Hey Tapilot, I'm working on this atp_tennis dataset and I could use a hand. Can you help me figure out if there are any missing values or weird data stuff going on? Please generate a value to show the number of missing values in this dataset.

[YOU (AI assistant)]: No problem, happy to help! Let's start by finding any missing values in the dataset. Here's some code for that :
'''
import pandas as pd
import pickle
from decision_company import read_csv_file, sum_up, is_null

atp_tennis = read_csv_file('atp_tennis.csv')

missing_values = sum_up(is_null(atp_tennis))

print(missing_values)

'''

[USER (data scientist)]: Awesome, thanks! Now, I wanna know how many unique values there are for a few columns like Tournament, Series, Court, Surface, and Round. Can you help me out with that? Please generate a Series summarizing the unique value counts for selected columns in this dataset.

[YOU (AI assistant)]: You got it! Here's the code to find the unique values for those columns :
'''
import pandas as pd
import pickle
from decision_company import read_csv_file, n_unique

atp_tennis = read_csv_file('atp_tennis.csv')

unique_values = n_unique(atp_tennis, ['Tournament', 'Series', 'Court', 'Surface', 'Round'])

print(unique_values)

'''

[USER (data scientist)]: Sweet! Now, I wanna make a data quality report that shows both the missing values and unique values for each column. Can you help me whip that up? Please generate a dataframe called data quality report showing the number of missing values and unique values in each column of the dataset.

[YOU (AI assistant)]: Absolutely! Let's create a DataFrame to show the data quality report. Here's the code for that :
'''
import pandas as pd
import pickle
from decision_company import read_csv_file, create_dataframe

atp_tennis = read_csv_file('atp_tennis.csv')

data_quality_report = create_dataframe({'Missing Values': missing_values, 'Unique Values': unique_values})

print(data_quality_report)

'''

[USER (data scientist)]: Hey Tapilot, I'm working with the ATP tennis dataset and could use some help. Can you give me a hand? 

[YOU (AI assistant)]: Absolutely! I'd be happy to help. What do you need assistance with? 

[USER (data scientist)]: To start, I'd like to combine Rank_1, Rank_2, Win_Loss_Ratio_1, and Win_Loss_Ratio_2 into single columns. Please generate a dataframe called combined data report showing the win/loss ratios and ranks of players in this dataset.

[YOU (AI assistant)]: Sure thing! Here's the code to merge those columns :
'''
import pandas as pd
import pickle
from decision_company import read_csv_file, count_unique_values, transform, generate_summary_stat, concatenate_objects, create_dataframe, generate_summary_stat

atp_tennis = read_csv_file('atp_tennis.csv')

# Count the number of wins for each player:
player_wins = count_unique_values(atp_tennis['Winner'])

# Count the total number of matches played by each player:
player_matches = count_unique_values(atp_tennis['Player_1']) + count_unique_values(atp_tennis['Player_2'])

# Calculate win/loss ratio for each player:
win_loss_ratios = player_wins / (player_matches - player_wins)

# Add win/loss ratios to the dataset:
atp_tennis['Win_Loss_Ratio_1'] = transform(atp_tennis['Player_1'], win_loss_ratios)
atp_tennis['Win_Loss_Ratio_2'] = transform(atp_tennis['Player_2'], win_loss_ratios)

# Descriptive Statistics Table:
desc_stats = generate_summary_stat(atp_tennis[['Rank_1', 'Rank_2', 'Win_Loss_Ratio_1', 'Win_Loss_Ratio_2']])

# Combine Rank_1, Rank_2, Win_Loss_Ratio_1, and Win_Loss_Ratio_2 into single columns:
combined_ranks = concatenate_objects(atp_tennis['Rank_1'], atp_tennis['Rank_2'])
combined_win_loss_ratios = concatenate_objects(atp_tennis['Win_Loss_Ratio_1'], atp_tennis['Win_Loss_Ratio_2'])

# Create a DataFrame with combined ranks and win/loss ratios:
combined_data = create_dataframe({'Rank': combined_ranks, 'Win_Loss_Ratio': combined_win_loss_ratios})

print(combined_data)

'''

[USER (data scientist)]: Awesome! Now, can you show me some basic stats for this combined data? Please generate a DataFrame containing descriptive statistics for the ATP tennis dataset.

[YOU (AI assistant)]: You got it! Here's the code to display the descriptive statistics :
'''
import pandas as pd
import numpy as np
import pickle
from decision_company import read_csv_file, generate_summary_stat

atp_tennis = read_csv_file('atp_tennis.csv')

# Descriptive Statistics Table:
desc_stats = generate_summary_stat(combined_data)

print(desc_stats)

'''

[USER (data scientist)]: Thanks a bunch! How about a histogram for player rankings distribution? Please generate and display a histogram showing the distribution of player rankings in the ATP tennis dataset.

[YOU (AI assistant)]: No problem! Here's the code to create a histogram using matplotlib :
'''
import pandas as pd
import matplotlib.pyplot as plt 
import pickle
from decision_company import read_csv_file, create_subplots, create_histogram_subplot, set_plot_split_title, make_xlabel, save_plot, show_plots

atp_tennis = read_csv_file('atp_tennis.csv')

# Histogram for Player Rankings Distribution:
fig1, ax1 = create_subplots()
create_histogram_subplot(ax1, combined_ranks, bins=50, alpha=0.5)
set_plot_split_title(ax1, 'Player Rankings Distribution')
make_xlabel(ax1, 'Ranking')
save_plot('./ref_result/hist_chart.png')
show_plots() 
'''

[USER (data scientist)]: That's cool. Now, I'd like to see a scatter plot of player rankings vs win/loss ratios. Please generate and display a scatter plot that compares player rankings with their win/loss ratios in the ATP tennis dataset.

[YOU (AI assistant)]: Sure thing! Here's the code to create a scatter plot using matplotlib :
'''
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from decision_company import read_csv_file, create_subplots, set_plot_split_title, show_plots, make_xlabel, save_plot, create_scatter_plot, make_ylabel, show_plots

atp_tennis = read_csv_file('atp_tennis.csv')

# Scatter Plot for Player Rankings vs Win/Loss Ratios:
fig2, ax2 = create_subplots()
create_scatter_plot(ax2, combined_data['Rank'], combined_data['Win_Loss_Ratio'], alpha=0.5)
set_plot_split_title(ax2, 'Player Rankings vs Win/Loss Ratios')
make_xlabel(ax2, 'Ranking')
make_ylabel(ax2, 'Win/Loss Ratio')
save_plot('./ref_result/scatter_chart.png')
show_plots()

'''

[USER (data scientist)]: Lastly, can you whip up a bar chart showing the number of matches played on different surface types? Please generate and display a bar chart showing the number of matches played on different surface types in the ATP tennis dataset.

[YOU (AI assistant)]: Of course! Here's the code to create a bar chart using pandas and matplotlib :
'''
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from decision_company import read_csv_file, count_unique_values, create_subplots, set_plot_split_title, show_plots, make_xlabel, save_plot, make_ylabel, create_bar_chart, show_plots

atp_tennis = read_csv_file('atp_tennis.csv')

# Bar Chart for Surface Types:
fig3, ax3 = create_subplots()
surface_counts = count_unique_values(atp_tennis['Surface'])
create_bar_chart(ax3, surface_counts)
set_plot_split_title(ax3, 'Matches by Surface Type')
make_xlabel(ax3, 'Surface')
make_ylabel(ax3, 'Number of Matches')
save_plot('./ref_result/bar_chart.png')
show_plots()

'''

[USER (data scientist)]: Hey, I've got this ATP tennis dataset, and I'm curious if there's any connection between the court surfaces (hard, clay, grass) and how well the players perform (like their win/loss ratios and rankings). Can you help me out? Please generate a DataFrame that includes selected columns related to surface type and player rankings from the ATP tennis dataset.

[YOU (AI assistant)]: Absolutely! Let's start by making a DataFrame with the court surfaces, player rankings, and win/loss ratios. Here's the code for that :
'''
import pandas as pd
import numpy as np
import pickle
from decision_company import read_csv_file, df_copy

atp_tennis = read_csv_file('atp_tennis.csv')

# Create a DataFrame with surface types, player rankings, and win/loss ratios
surface_data = df_copy(atp_tennis[['Surface', 'Rank_1', 'Rank_2', 'Win_Loss_Ratio_1', 'Win_Loss_Ratio_2']])

print(surface_data)

'''

[USER (data scientist)]: Cool, now let's clean up the data by getting rid of any rows with missing or wonky data in the Win_Loss_Ratio columns. Please generate a DataFrame that removes rows with missing values in 'Win_Loss_Ratio_1' and 'Win_Loss_Ratio_2' from a subset of the ATP tennis dataset.

[YOU (AI assistant)]: You got it! Here's the code to clean up the data :
'''
import pandas as pd
import pickle
from decision_company import read_csv_file, df_copy, dropna

atp_tennis = read_csv_file('atp_tennis.csv')

# Remove rows with missing or invalid data in the Win_Loss_Ratio columns
surface_data_clean = df_copy(dropna(surface_data, subset_columns=['Win_Loss_Ratio_1', 'Win_Loss_Ratio_2']))

print(surface_data_clean)

'''

[USER (data scientist)]: Great, now let's do some one-hot encoding for the court surfaces. Please generate a DataFrame with one-hot encoded values for the 'Surface' column in a subset of the ATP tennis dataset.

[YOU (AI assistant)]: Sure thing! Here's the code to do one-hot encoding for the court surfaces :
'''
import pandas as pd
import pickle
from decision_company import read_csv_file, get_dummies

atp_tennis = read_csv_file('atp_tennis.csv')

# One-hot encoding for surface types
surface_data_clean = get_dummies(surface_data_clean, columns=['Surface'], prefix='', prefix_sep='')

print(surface_data_clean)

'''

[USER (data scientist)]: Now, let's find the average player ranking and win/loss ratio for each match. Please generate a DataFrame that calculates and includes the average rank and average win/loss ratio for each row in a subset of the ATP tennis dataset.

[YOU (AI assistant)]: You got it! Here's the code to calculate those averages :
'''
import pandas as pd
import pickle
from decision_company import read_csv_file
atp_tennis = read_csv_file('atp_tennis.csv')

# Calculate the average player ranking and win/loss ratio for each match
surface_data_clean.loc[:, 'Avg_Rank'] = (surface_data_clean['Rank_1'] + surface_data_clean['Rank_2']) / 2
surface_data_clean.loc[:, 'Avg_Win_Loss_Ratio'] = (surface_data_clean['Win_Loss_Ratio_1'] + surface_data_clean['Win_Loss_Ratio_2']) / 2

print(surface_data_clean)

'''

[USER (data scientist)]: Next up, let's make a correlation matrix for the court surfaces, average player rankings, and average win/loss ratios. Please generate a DataFrame that displays the correlation matrix for selected columns, including surface types and average statistics, in a subset of the ATP tennis dataset.

[YOU (AI assistant)]: Here's the code to create that correlation matrix :
'''
import pandas as pd
import pickle
from decision_company import read_csv_file, corr

atp_tennis = read_csv_file('atp_tennis.csv')

# Create a Correlation Matrix
corr_matrix = corr(surface_data_clean[['Hard', 'Clay', 'Grass', 'Avg_Rank', 'Avg_Win_Loss_Ratio']])

print(corr_matrix)

'''

[USER (data scientist)]: Now, let's do an ANOVA test to see if there are any big differences in the average win/loss ratios between the court surfaces. Please conduct an ANOVA test to compare the average win/loss ratios across different surface types in the ATP tennis dataset.

[YOU (AI assistant)]: You got it! Here's the code to run the ANOVA test :
'''
import pandas as pd
import scipy.stats as stats
import pickle
from decision_company import read_csv_file, f_oneway

atp_tennis = read_csv_file('atp_tennis.csv')

# Perform Statistical Tests
anova_result_clean = f_oneway(surface_data_clean['Avg_Win_Loss_Ratio'][surface_data_clean['Hard'] == 1],
                              surface_data_clean['Avg_Win_Loss_Ratio'][surface_data_clean['Clay'] == 1],
                              surface_data_clean['Avg_Win_Loss_Ratio'][surface_data_clean['Grass'] == 1])

print(anova_result_clean)

'''

[USER (data scientist)]: Lastly, let's find confidence intervals for the average win/loss ratio for each court surface, so at first calculate the mean and standard error of the mean for the average win/loss ratio for each surface type. Please generate and display a series of values representing the mean and standard error of the mean (SEM) for average win/loss ratios on hard, clay, and grass surfaces in the ATP tennis dataset.

My template of code snippet is:

---BEGIN CODE TEMPLATE---

import pandas as pd
import scipy.stats as stats
import pickle
from decision_company import read_csv_file,

# please import the necessary private functions from decision_company first

atp_tennis = read_csv_file('atp_tennis.csv')

# YOUR SOLUTION BEGIN:
<code1>
[COMPLETE YOUR CODE]
</code1>
# YOUR SOLUTION END

print(hard_mean)

# save data
pickle.dump(hard_mean,open("./ref_result/hard_mean.pkl","wb"))

# YOUR SOLUTION BEGIN:
<code2>
[COMPLETE YOUR CODE]
</code2>
# YOUR SOLUTION END

print(hard_sem)

# save data
pickle.dump(hard_sem,open("./ref_result/hard_sem.pkl","wb"))

# YOUR SOLUTION BEGIN:
<code3>
[COMPLETE YOUR CODE]
</code3>
# YOUR SOLUTION END

print(clay_mean)

# save data
pickle.dump(clay_mean,open("./ref_result/clay_mean.pkl","wb"))

# YOUR SOLUTION BEGIN:
<code4>
[COMPLETE YOUR CODE]
</code4>
# YOUR SOLUTION END

print(clay_sem)

# save data
pickle.dump(clay_sem,open("./ref_result/clay_sem.pkl","wb"))

# YOUR SOLUTION BEGIN:
<code5>
[COMPLETE YOUR CODE]
</code5>
# YOUR SOLUTION END

print(grass_mean)

# save data
pickle.dump(grass_mean,open("./ref_result/grass_mean.pkl","wb"))

# YOUR SOLUTION BEGIN:
<code6>
[COMPLETE YOUR CODE]
</code6>
# YOUR SOLUTION END

print(grass_sem)

# save data
pickle.dump(grass_sem,open("./ref_result/grass_sem.pkl","wb"))

---END CODE TEMPLATE---

Please note that you have to generate the WHOLE python code instead of code segments based on the code snippet using Pandas library 2.0.3 version and Matplotlib library 3.7.4 version. You must keep all comments in code snippet unchanged.

[YOU (AI assistant)]: Here's the code to get those stats for each court surface: 
'''
import pandas as pd
import scipy.stats as stats
import pickle
from decision_company import read_csv_file,

# please import the necessary private functions from decision_company first

atp_tennis = read_csv_file('atp_tennis.csv')

# YOUR SOLUTION BEGIN:
